Skip to content

Conversation

justinchuby
Copy link
Collaborator

@justinchuby justinchuby commented Oct 15, 2025

Output present key value from the Attention op because past key value is provided. Previously the Attention op created would consume past key/value but not produce present key/value, which is not correct for ORT.

image

Replaces #2632

Signed-off-by: Justin Chu <[email protected]>
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves GQA (Grouped Query Attention) fusion by modifying the attention operation to output present key-value pairs in addition to the attention result. The change ensures that both past and present key-value states are properly handled in the fused operation.

  • Modified the pattern function to return present key and value tensors alongside the attention output
  • Updated the rewrite function to specify 3 outputs for the attention operation

@justinchuby justinchuby added this to the 0.5.4 milestone Oct 15, 2025
Copy link

codecov bot commented Oct 15, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.38%. Comparing base (811937c) to head (3fb7223).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2634   +/-   ##
=======================================
  Coverage   70.38%   70.38%           
=======================================
  Files         222      222           
  Lines       26288    26288           
  Branches     2629     2629           
=======================================
  Hits        18503    18503           
  Misses       6865     6865           
  Partials      920      920           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@justinchuby justinchuby changed the title Improve GQA fusion Improve GQA fusion to produce present key/value Oct 15, 2025
@justinchuby justinchuby changed the title Improve GQA fusion to produce present key/value Fix GQA fusion to produce present key/value Oct 15, 2025
@gramalingam gramalingam merged commit 75b3d42 into main Oct 15, 2025
32 checks passed
@gramalingam gramalingam deleted the justinchu/attention-fuse branch October 15, 2025 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

2 participants